SNIP1 and PRC2 coordinate cell fates of neural progenitors during brain development

Stem cell survival versus death is a developmentally programmed process essential for morphogenesis, sizing, and quality control of genome integrity and cell fates. Cell death is pervasive during development, but its programming is little known. Here, we report that Smad nuclear interacting protein 1 (SNIP1) promotes neural progenitor cell survival and neurogenesis and is, therefore, integral to brain development. The SNIP1-depleted brain exhibits dysplasia with robust induction of caspase 9-dependent apoptosis. Mechanistically, SNIP1 regulates target genes that promote cell survival and neurogenesis, and its activities are influenced by TGFβ and NFκB signaling pathways. Further, SNIP1 facilitates the genomic occupancy of Polycomb complex PRC2 and instructs H3K27me3 turnover at target genes. Depletion of PRC2 is sufficient to reduce apoptosis and brain dysplasia and to partially restore genetic programs in the SNIP1-depleted brain in vivo. These findings suggest a loci-specific regulation of PRC2 and H3K27 marks to toggle cell survival and death in the developing brain.

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection For FACS analysis, data were collected with the software (Yurika: software name and version #}. For IF analysis, images were captured on Nikon C2 using NIS-Elements 5.30.05 (Build 1559). For WB analysis, images were captured on Odyssey® Fc imaging system (LI-COR) using the Image Studio™ software version 1.0.14.

Data analysis
Image analysis is performed on the Image J2 FIJI software, version 2.9.0/1.53t. Statistical analyses were performed using R version 4.0.1 or Prism version 9.0.2. Codes for CUT&RUN analyses are avaialable at https://doi.org/10.6084/m9.figshare.7411835.

RNA-seq
Raw reads were first trimmed using TrimGalore (version 0.6.3) available at: https://www.bioinformatics.babraham.ac.uk/projects/ trim_galore/, with parameters '--paired --retain_unpaired'. Filtered reads were then mapped to the Mus musculus reference genome (GRCm38.p6 + Gencode-M22 Annotation) using STAR (version 2.7.9a) (Dobin and Gingeras, 2015) [PMID: 26334920]. Gene-level read quantification was done using RSEM (version 1.3.1) (Li and Dewey, 2011). To identify the differentially expressed genes between control and experimental samples, the variation in the library size between samples was first normalized by trimmed mean of M values (TMM) and genes with CPM < 1 in all samples were eliminated. Then, the normalized data were applied to linear modeling with the voom from the limma R package (Law et al., 2014). Gene set enrichment analysis (GSEA) was performed against using the MSigDB database (version 7.1), and differentially expressed genes were ranked based on the their log2(FC) * -log10(p-value) (Liberzon et al., 2015;Subramanian et al., 2005).

CUT&RUN
The reads were aligned to mouse mm10 genome reference and fruit fly dm6 genome reference by BWA (version 0.7.170.7.12, default parameter). Duplicated reads were marked by the bamsormadup from the biobambam tool (version 2.0.87) available at https:// March 2021 www.sanger.ac.uk/tool/biobambam/. Uniquely mapped reads were kept by samtools (parameter "-q 1 -F 1804," version 1.14). Fragments < 2000 bp were kept for peak calling and bigwig files were generated for visualization. SICER (Xu et al., 2014) andmacs2 (Zhang et al., 2008) were both used for peak calling to identify both the narrow and broad peak correctly. With SICER, we assigned peaks that were at the top 1 percentile as the high-confidence peaks and the top 5 percentile as the low-confidence peaks. Two sets of peaks were generated: Strong peaks called with parameter 'FDR < 0.05' by at least one method (macs2 or SICER) and weak peaks called with parameter 'FDR < 0.5' by at least one method (macs2 or SICER). Peaks were considered reproducible if they were supported by a strong peak from all replicates or at least one strong peak and a weak peak in the other replicates. For downstream analyses, heatmaps were generated by deepTools (Ramirez et al., 2014) and gene ontology was performed with Enrichr (Chen et al., 2013;Kuleshov et al., 2016) and GSEA, in addition to custom R scripts. For differential peak analysis, peaks from two replicates were merged and counted for number of overlapping extended reads for each sample (bedtools v2.24.0) (Quinlan and Hall, 2010). Then, we detected the differential peaks by the empirical Bayes method (eBayes function from the limma R package) (Law et al., 2014). Peaks were annotated based on Gencode following this priority: "Promoter.Up": if they fall within TSS -2kb, "Promoter.Down": if they fall within TSS -2kb, "exonic" or "intronic": if they fall within an exon or intron of any isoform, "TES peaks": if they fall within TES 2kb, "distal5" or "distal3" if they are with 50kb upstream of TSS or 50kb downstream of TES, respectively, and they are classified as "intergenic" if they do not fit in any of the previous categories.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy All sequencing data are deposited in GEO: GSE212445. Reads were mapped to the Mus musculus reference genome (GRCm38.p6 + Gencode-M22 Annotation).

Human research participants
Policy information about studies involving human research participants and Sex and Gender in Research.
Reporting on sex and gender Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
The sample size were determined by a statistical power analysis using the online tool, powerandsamplesize.com/Calculators. Most calculations were done with the intention to compare 2 means, with 2-sample, 1-sided assumption.
Data exclusions No exclusion.

Replication
Observations were repeated at least 2 times. RNA-seq was done with wildtype, Snip1 conditional KO, and Snip1-Eed conditional KO cells with at least 3 replicates. Chromatin profiling was done with CUT&RUN with wildtype vs. Snip1-conditional knockout mouse NPCs with mostly 2 replicates. Replication were successful.
Randomization We did not perform randomization, which is not relevant to this study.
Experimental groups were wildtype, Snip1 conditional KO, and Snip1-Eed conditional KO embryos. The groups were analyzed by pair-wise analyses to address specific hypotheses. Because of the nature of addressing these hypotheses (biological questions), randomization is not applicable.
During data collection, investigators could not be blinded to group allocation. Blinding was not possible because the experimental groups (wild-type, Snip1 conditional KO, and Snip1-Eed conditional KO embryos) were structurally distinct from each other. Snip1 conditional KO embryos had severe brain tissue atrophy, which were quite structurally distinct from wild-type and also different from Snip1-Eed conditional KO. At sample collection and onward, experimentalists saw the structurally variant samples and could not be blinded. Data analysis was not blinded. However, seq data were analyzed without assumption of data trend, 'unsupervised.' Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. All antibodies were purchased from commercial vendors based on previous validation from publications from highly reputable labs/ sources/ENCODE project. For most antibodies, western blotting and immunofluorescence were performed to ensure that the antibody recognizes 1 protein at the right size or immunofluorescence signals localize to the right structures in the cells. For Snip1 antibodies, protein depletion was used in western blotting for validation. More detailed validation information is listed below. Anti-Jarid2, murine knockout validated by Novus Biological (https://www.novusbio.com/products/jumonji-jarid2-antibody_nb100-2214) Anti-Ezh2, murine reactivity tested by Active Motif (https://www.activemotif.com/catalog/details/39933/ezh2-antibody-pab) Anti-Ezh2, murine reactivity tested by Active Motif (https://www.activemotif.com/catalog/details/39875/ezh2-antibody-mab-clone-ac22)

Eukaryotic cell lines Policy information about cell lines and Sex and Gender in Research
Cell line source(s) nature portfolio | reporting summary

March 2021
Animals at age 2-12 months were used for timed mating crosses to yield embryos at E11.5-13.5 for analyses.

Wild animals
None.

Reporting on sex
Male and female mouse embryos were randomly assigned in this study. Because our genes of interest are not located on any of the sex chromosomes, phenotypes and/or molecular observations are expected to be similar between males and females in our study.
Field-collected samples None.
Ethics oversight IACUC at St. Jude approved and oversaw the breeding and use of the mouse animals in this study.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

ChIP-seq Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication.  (Zhang et al., 2008) were both used for peak calling to identify both the narrow and broad peak correctly. With SICER, we assigned peaks that were at the top 1 percentile as the high-confidence peaks and the top 5 percentile as the low-confidence peaks. Two sets of peaks were generated: Strong peaks called with parameter 'FDR < 0.05' by at least one method (macs2 or SICER) and weak peaks called with parameter 'FDR < 0.5' by at least one method (macs2 or SICER). Peaks were considered reproducible if they were supported by a strong peak from all replicates or at least one strong peak and a weak peak in the other replicates.

Data quality
We visualized CUT&RUN peaks on the Integrated Genomics Viewer (Broad Institute) to validate called peaks and the consistency among replicates. We also calculated the Pearson correlation coefficient among replicates, which suggested high reproducibility.

Software
The reads were aligned to mouse mm10 genome reference and fruit fly dm6 genome reference by BWA (version 0.7.170.7.12, default parameter). Duplicated reads were marked by the bamsormadup from the biobambam tool (version 2.0.87) available at https://www.sanger.ac.uk/tool/biobambam/. Uniquely mapped reads were kept by samtools (parameter "-q 1 -F 1804," version 1.14). Fragments < 2000 bp were kept for peak calling and bigwig files were generated for visualization. SICER (Xu et al., 2014) andmacs2 (Zhang et al., 2008) were both used for peak calling to identify both the narrow and broad peak correctly. With SICER, we assigned peaks that were at the top 1 percentile as the high-confidence peaks and the top 5 percentile as the low-confidence peaks. Two sets of peaks were generated: Strong peaks called with parameter 'FDR < 0.05' by at least one method (macs2 or SICER) and weak peaks called with parameter 'FDR < 0.5' by at least one method (macs2 or SICER). Peaks were considered reproducible if they were supported by a strong peak from all replicates or at least one strong peak and a weak peak in the other replicates. For downstream analyses, heatmaps were generated by deepTools (Ramirez et al., 2014) and gene ontology was performed with Enrichr (Chen et al., 2013;Kuleshov et al., 2016) and GSEA, in addition to custom R scripts. For differential peak analysis, peaks from two replicates were merged and counted for number of overlapping extended reads for each sample (bedtools v2.24.0) (Quinlan and Hall, 2010). Then, we detected the differential peaks by the empirical Bayes method (eBayes function from the limma R package) (Law et al., 2014). Peaks were annotated based on Gencode following this priority: "Promoter.Up": if they fall within TSS -2kb, "Promoter.Down": if they fall within TSS -2kb, "exonic" or "intronic": if they fall within an exon or intron of any isoform, "TES peaks": if they fall within TES 2kb, "distal5" or "distal3" if they are with 50kb upstream of TSS or 50kb downstream of TES, respectively, and they are classified as "intergenic" if they do not fit in any of the previous categories.

Flow Cytometry
Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.
FACS-based cell death assay 5 X 10^5 NPCs from Snip1[+/+] and Snip1[flox/flox] embryos were seeded onto each well of matrigel-coated 6-well plates. On the following day, cells were incubated with mCherry-Cre lentivirus (Vector Core Lab at St. Jude Children's Research Hospital) for 8 hours, washed twice with 1X PBS, and cultured for 3 days. To quantify the population of cells with active caspases 3 and 7, cells were incubated at 37 °C with reconstituted FAM-FLICA® at a 1:300 dilution (ImmunoChemistry Technologies 94) for 30 min. Cells were fixed in a 4% formaldehyde solution at room temperature for 15 min and washed twice with 1X PBS. FAM-FLICA-positive cells were quantified by FACS (Excitation: 492 nm, Emission: 520 nm). FACS data were analyzed by FlowJo. To examine whether cell death is via activation of caspase 8 or 9, Z-IETD-FMK (a caspase 8 inhibitor) and Z-LEHD-FMK TFA (a caspase 9 inhibitor) were dissolved in DMSO at 50mM (Compound Management Center at St. Jude Children's Research Hospital). After cells were incubated with mCherry-Cre lentivirus for 8 hours, these compounds were added at a series of concentrations and incubated for 3 days before FACS analysis. For all the inhibitor treatment assays, medium with the inhibitors was changed every 2 days.

Instrument
S3e Cell Sorter Software ProSort software Cell population abundance About 2-5 million cells were used in each sort. The GFP-positive fraction was about 80-90% of populations depending on the genotype.
FACS-based cell death assay More than 1 X10^4 cells were analyzed in each sample. The FAM-FLICA-positive fraction ranged from close to 0% to around 15%, depending on the genotype and presence/absence of Cre recombinase.